Spoken to Spoken vs. Spoken to Written: Corpus Approach to Exploring Interpreting and Subtitling

نویسندگان

  • Mikhail Mikhailov
  • Hannu Tommola
  • Nina Isolahti
چکیده

issue of Polibits includes a selection of papers related to the topic of processing of semantic information. Processing of semantic information involves usage of methods and technologies that help machines to understand the meaning of information. These methods automatically perform analysis, extraction, generation, interpretation, and annotation of information contained on the Web, corpus, natural language systems, and other data. The special section of this issue consists of six papers dedicated to processing of semantic information. The first four papers present new proposals on processing of semantic information using corpora. The fifth paper analyses opinions. The final paper of this section use classification rules for creation of conceptual graphs. The paper " Spoken to Spoken vs. Spoken to Written: Corpus Approach to Exploring Interpreting and Subtitling " deals with corpora of Finnish-Russian interpreting discourse and subtitling. The software package developed for processing of the corpora includes routines specially written for studying speech transcripts rather than written text. For example, speaker statistics function calculates number of words, number of pauses, their duration, and average speech time of a certain speaker. The paper " Semi-Automatic Parallel Corpora Extraction from Comparable News Corpora " develops an effective technique that extracts parallel corpus between Manipuri, a morphologically rich and resource constrained Indian language, and English from comparable news corpora collected from the Web. The paper " A Natural Language Dialogue System for Impression-based Music Retrieval " evaluates a natural language dialogue system with 164 impression words, 14 comparative expressions, such as " a little more " and " more and more, " and modifies the most recently used query vector through a dialogue. Also, the paper evaluates performance using 35 participants to determine the effectiveness of the proposed dialogue system. The paper " Retrieving Lexical Semantics from Multilingual Corpora " proposes an unsupervised technique for building a lexical resource like WordNet used for annotation of parallel corpora. The reported results are for English, German, French, and Greek using the Europarl parallel corpus. The multilingual aspect of the approach helps in reducing the ambiguity inherent in any words/phrases in the English language. The research presented in the paper " Opinion Mining using Ontologies " analyses opinions using an innovative approach based on ontology fusion and matching. The proposed method allows two enterprises to share and merge the results of opinion analyses on their own products and services. The paper " Learning of Chained Rules for Construction …

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Semantic processing survey of spoken and written words in adolescents with cerebral palsy: Evidence from PALPA word-picture matching test

Objective: The present study aimed to assess and compare semantic processing of spoken and written words in adolescents with cerebral palsy and healthy adolescents. Method: The present study is quantitative in terms of type and experimental in terms of method. Examination Group consisted 30 adolescents with cerebral palsy aged 10 to 15 years were selected by convenience sampling method. All of ...

متن کامل

The Effect of CMC in Business Emails in Lingua Franca: Discourse Features and Misunderstandings

The paper argues that everyday exchange of business emails produces a development in the work-group relationship, which, in turn, makes new communication styles possible and acceptable by the users' habit to computer-mediated forms, even in unbalanced professional exchanges. The focus is on the (spoken) discourse features of email messages in a self-compiled corpus of selected computer-mediated...

متن کامل

Language Models of Spoken Dutch

In Flanders, all TV shows are subtitled. However, the process of subtitling is a very time-consuming one and can be sped up by providing the output of a speech recognizer run on the audio of the TV show, prior to the subtitling. Naturally, this speech recognition will perform much better if the employed language model is adapted to the register and the topic of the program. We present several l...

متن کامل

Vague Language and Interpersonal Communication: An Analysis of Adolescent Intercultural Conversation

This paper is concerned with the analysis of the spoken language of teenagers, taken from a newly developed specialised corpus the British and Taiwanese Teenage Intercultural Communication Corpus (BATTICC). More specifically, the study employs a discourse analytical approach to examine vague language in an intercultural context among a group of British and Taiwanese adolescents, paying particul...

متن کامل

Tagging spoken corpus

Spoken languages are more flexible in usage than written languages. Thus, tagging spoken corpus differs from tagging traditional written corpus. This paper proposes a new framework for tagging spoken corpus. The framework adopts the written tagger to process spoken data with the special consideration of the characteristics of spoken language. Besides, the problems of different tagging sets betw...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Polibits

دوره 41  شماره 

صفحات  -

تاریخ انتشار 2010